Search results for "Automatic speech"

showing 5 items of 5 documents

InspirationWall

2015

Collaborative idea generation leverages social interactions and knowledge sharing to spark diverse associations and produce creative ideas. Information exploration systems expand the current context by suggesting novel but related concepts. In this paper we introduce InspirationWall, an unobtrusive display that leverages speech recognition and information exploration to enhance an ongoing idea generation session with automatically retrieved concepts that relate to the conversation. We evaluated the system in six idea generation sessions of 20 minutes with small groups of two people. Preliminary results suggest that InspirationWall contrasts the decay of idea productivity over time and can t…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniInformation ExplorationSettore INF/01 - InformaticaComputer sciencemedia_common.quotation_subjectContext (language use)Automatic Speech RecognitionIdeationIdea generationSession (web analytics)Knowledge sharingSPARK (programming language)Human–computer interactionConversationInformation explorationcomputercomputer.programming_languagemedia_commonProceedings of the 2015 ACM SIGCHI Conference on Creativity and Cognition
researchProduct

Keyword Based Keyframe Extraction in Online Video Collections

2015

Keyframe extraction methods aim to find in a video sequence the most significant frames, according to specific criteria. In this paper we propose a new method to search, in a video database, for frames that are related to a given keyword, and to extract the best ones, according to a proposed quality factor. We first exploit a speech to text algorithm to extract automatic captions from all the video in a specific domain database. Then we select only those sequences (clips), whose captions include a given keyword, thus discarding a lot of information that is useless for our purposes. Each retrieved clip is then divided into shots, using a video segmentation method, that is based on the SURF d…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniInformation retrievalbusiness.industryComputer sciencemedia_common.quotation_subjectShot (filmmaking)InformationSystems_INFORMATIONSTORAGEANDRETRIEVALFrame (networking)ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONPattern recognitionDomain (software engineering)Factor (programming language)Metric (mathematics)Quality (business)SegmentationArtificial intelligencebusinesscomputerSentencemedia_commoncomputer.programming_languageVideo Summarization Keyframe Extraction Automatic Speech Recognition YouTube Multimedia Collections
researchProduct

Using privacy-transformed speech in the automatic speech recognition acoustic model training

2020

Automatic Speech Recognition (ASR) requires huge amounts of real user speech data to reach state-of-the-art performance. However, speech data conveys sensitive speaker attributes like identity that can be inferred and exploited for malicious purposes. Therefore, there is an interest in the collection of anonymized speech data that is processed by some voice conversion method. In this paper, we evaluate one of the voice conversion methods on Latvian speech data and also investigate if privacy-transformed data can be used to improve ASR acoustic models. Results show the effectiveness of voice conversion against state-of-the-art speaker verification models on Latvian speech and the effectivene…

Speaker verificationevaluationvoice conversionComputer scienceSpeech recognitionautomatic speech recognitionLatvianAcoustic model[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]privacylanguage.human_language[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]anonymization[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG][INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]Identity (object-oriented programming)languageConversion methodautomatic speaker verification
researchProduct

ASR in Classroom Today : Automatic Visualization of Conceptual Network in Science Classrooms

2017

Automatic Speech Recognition (ASR) field has improved substantially in the last years. We are in a point never saw before, where we can apply such algorithms in non-ideal conditions such as real classrooms. In these scenarios it is still not possible to reach perfect recognition rates, however we can already take advantage of these improvements. This paper shows preliminary results using ASR in Chilean and Finnish middle and high school to automatically provide teachers a visualization of the structure of concepts present in their discourse in science classrooms. These visualizations are conceptual networks that relate key concepts used by the teacher. This is an interesting tool that gives…

Structure (mathematical logic)puheentunnistusPoint (typography)MultimediaComputer sciencekoulut (oppilaitokset)05 social sciencesautomatic speech recognition050301 education02 engineering and technologyconceptual networkcomputer.software_genreopetusField (computer science)VisualizationConceptual network020204 information systemsSimilarity (psychology)0202 electrical engineering electronic engineering information engineeringKey (cryptography)ComputingMilieux_COMPUTERSANDEDUCATIONteacher discourseclassroom dialogue0503 educationcomputer
researchProduct

Glottal Source Features for Automatic Speech-Based Depression Assessment

2017

Depression is one of the most prominent mental disorders, with an increasing rate that makes it the fourth cause of disability worldwide. The field of automated depression assessment has emerged to aid clinicians in the form of a decision support system. Such a system could assist as a pre-screening tool, or even for monitoring high risk populations. Related work most commonly involves multimodal approaches, typically combining audio and visual signals to identify depression presence and/or severity. The current study explores categorical assessment of depression using audio features alone. Specifically, since depression-related vocal characteristics impact the glottal source signal, we exa…

machine learningComputer scienceSpeech recognitionglottal source0202 electrical engineering electronic engineering information engineeringAutomatic speechPhase Distortion Deviation020206 networking & telecommunications020201 artificial intelligence & image processing02 engineering and technologybi-nary classificationDepression (differential diagnoses)Interspeech 2017
researchProduct